Evolutionary minimization of the Rand index for speaker clustering

نویسندگان

  • Wei-Ho Tsai
  • Hsin-Min Wang
چکیده

ABSTRACT We propose an effective method for clustering unknown speech utterances based on their associated speakers. The method jointly optimizes the generated clusters and the required number of clusters by estimating and minimizing the Rand index. The metric reflects the clustering errors that arise when utterances from the same speaker are placed in different clusters; or when utterances from different speakers are placed in the same cluster. One useful characteristic of the Rand index is that its value only reaches the minimum when the number of clusters is equal to the size of the true speaker population. We approximate the Rand index by a function of the similarity measures between utterances and then use a genetic algorithm to determine the cluster in which each utterance should be located, such that the function is minimized. Our experiment results show that this novel speaker-clustering method outperforms conventional methods that use the Bayesian information criterion to determine the required number of clusters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cluster Identification for Speaker

Cluster Identification is introduced as the process of jointly evaluating clustering and labelling schemes for cluster-labelling scheme selection. Normalized Rand and BBN metrics for comparing clustering performances across varied clustering and labelling schemes are presented. The merits of the metrics are evaluated and applied for speaker-environment tracking in Broadcast News.

متن کامل

Bilateral Weighted Fuzzy C-Means Clustering

Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...

متن کامل

Improved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring

In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank...

متن کامل

Integration of evolutionary computation algorithms and new AUTO-TLBO technique in the speaker clustering stage for speaker diarization of broadcast news

The task of speaker diarization is to answer the question "who spoke when?" In this paper, we present different clustering approaches which consist of Evolutionary Computation Algorithms (ECAs) such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO) algorithm, and Differential Evolution (DE) algorithm as well as Teaching-Learning-Based Optimization (TLBO) technique as a new optimizati...

متن کامل

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2009